46 research outputs found

    Bandit-Based Genetic Programming

    Get PDF
    International audienceWe consider the validation of randomly generated patterns in a Monte-Carlo Tree Search program. Our bandit-based genetic programming (BGP) algorithm, with proved mathematical properties, outperformed a highly optimized handcrafted module of a well-known computer-Go program with several world records in the game of Go

    Progress Rate in Noisy Genetic Programming for Choosing λ

    Get PDF
    International audienceRecently, it has been proposed to use Bernstein races for implementing non-regression testing in noisy genetic programming. We study the population size of such a (1+λ) evolutionary algorithm applied to a noisy fitness function optimization by a progress rate analysis and experiment it on a policy search application

    On the Parallelization of Monte-Carlo planning

    Get PDF
    International audienceWe provide a parallelization with and without shared-memory for Bandit-Based Monte-Carlo Planning algorithms, applied to the game of Go. The resulting algorithm won the first non-blitz game against a professionnal human player in 9x9 Go

    Adding expert knowledge and exploration in Monte-Carlo Tree Search

    Get PDF
    International audienceWe present a new exploration term, more efficient than clas- sical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classi- cal online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo: { We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. { We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. { Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien

    Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

    Get PDF
    International audienceThis paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process

    A Principled Method for Exploiting Opening Books

    Get PDF
    International audienceWe used in the past a lot of computational power and human expertise for having a very big dataset of good 9x9 Go games, in order to build an opening book. We improved a lot the algorithm used for gen- erating these games. Unfortunately, the results were not very robust, as (i) opening books are definitely not transitive, making the non-regression testing extremely difficult and (ii) different time settings lead to opposite conclusions, because a good opening for a game with 10s per move on a single core is very different from a good opening for a game with 30s per move on a 32-cores machine (iii) some very bad moves sometimes occur. In this paper, we formalize the optimization of an opening book as a matrix game, compute the Nash equilibrium, and conclude that a naturally randomized opening book provides optimal performance (in the sense of Nash equilibria); surprisingly, from a finite set of opening books, we can choose a distribution on these opening books so that this random solution has a significantly better performance than each of the deterministic opening book

    Continuous Upper Con dence Trees

    Get PDF
    International audienceUpper Con dence Trees are a very e cient tool for solving Markov Decision Processes; originating in di cult games like the game of Go, it is in particular surprisingly e cient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Con dence Trees to continuous stochastic problems. We (i) show a deceptive problem on which the classical Upper Con dence Tree approach does not work, even with arbitrarily large computational power and with progressive widening (ii) propose an improvement, termed double-progressive widening, which takes care of the compromise between variance (we want in nitely many simulations for each action/state) and bias (we want su ciently many nodes to avoid a bias by the rst nodes) and which extends the classical progressive widening (iii) discuss its consistency and show experimentally that it performs well on the deceptive problem and on experimental benchmarks. We guess that the double-progressive widening trick can be used for other algorithms as well, as a general tool for ensuring a good bias/variance compromise in search algorithms

    Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

    Get PDF
    National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ïŹciel Ă  4 Ă©chelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisĂ©s pour accĂ©lĂ©rer l'explora- tion mais sont ensuite peu Ă  peu laissĂ©s de cĂŽtĂ© Ă  mesure que des informations plus ïŹnes sont disponibles ; – apprentissage hors-ligne, par fouille de donnĂ©es de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque Ă©lĂ©ment sĂ©parĂ©ment. Nous mettons en Ă©vidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une trĂšs forte amĂ©lioration par calage des paramĂštres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofïŹ‚ine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

    Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

    Get PDF
    International audienceThis paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process
    corecore